NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures

Ene, Alina; Epasto, Alessandro; Mirrokni, Vahab; Nguyen, Hoai-An; Nguyen, Huy L; Woodruff, David P; Zhong, Peilin (July 2025, Proceedings of Machine Learning Research)

In the maximum coverage problem we are given d subsets from a universe [n], and the goal is to output k subsets such that their union covers the largest possible number of distinct items. We present the first algorithm for maximum coverage in the turnstile streaming model, where updates which insert or delete an item from a subset come one-by-one. Notably our algorithm only uses polylogn update time. We also present turnstile streaming algorithms for targeted and general fingerprinting for risk management where the goal is to determine which features pose the greatest re-identification risk in a dataset. As part of our work, we give a result of independent interest: an algorithm to estimate the complement of the pth frequency moment of a vector for p ≥ 2. Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to 210x over prior work.
more » « less
Free, publicly-accessible full text available July 13, 2026
Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Das, Rudrajit; Dhillon, Inderjit_S; Epasto, Alessandro; Javanmard, Adel; Mao, Jieming; Mirrokni, Vahab; Sanghavi, Sujay; Zhong, Peilin (May 2025, https://doi.org/10.48550/arXiv.2406.11206)

Training with noisy labels often yields suboptimal performance, but retraining a model with its own predicted hard labels (binary 1/0 outputs) has been empirically shown to improve accuracy. This paper provides the first theoretical characterization of this phenomenon. In the setting of linearly separable binary classification with randomly corrupted labels, the authors prove that retraining can indeed improve the population accuracy compared to initial training with noisy labels. Retraining also has practical implications for local label differential privacy (DP), where models are trained with noisy labels. The authors propose consensus-based retraining, where retraining is done selectively on samples for which the predicted label matches the given noisy label. This approach significantly improves DP training accuracy at no additional privacy cost. For example, training ResNet-18 on CIFAR-100 with ε = 3 label DP achieves over 6% accuracy improvement with consensus-based retraining.
more » « less
Free, publicly-accessible full text available May 7, 2026
Massively Parallel Tree Embeddings for High Dimensional Spaces

https://doi.org/10.1145/3558481.3591096

Ahanchi, AmirMohsen; Andoni, Alexandr; Hajiaghayi, MohammadTaghi; Knittel, Marina; Zhong, Peilin (January 2023, SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures)

Full Text Available
Parallel approximate undirected shortest paths via low hop emulators

https://doi.org/10.1145/3357713.3384321

Andoni, Alexandr; Stein, Clifford; Zhong, Peilin (June 2020, Symposium on Theory of Computing (STOC))
null (Ed.)
Full Text Available
Connected Components on a PRAM in Log Diameter Time

https://doi.org/10.1145/3350755.3400249

Liu, Sixue Cliff; Tarjan, Robert E.; Zhong, Peilin (July 2020, 32nd ACM Symposium on Parallelism in Algorithms and Architectures)
null (Ed.)
Full Text Available
Enhancing Adversarial Defense by k-Winners-Take-All

Xiao, Chang; Zhong, Peilin; Zheng, Changxi (January 2020, 8th International Conference on Learning Representations)
null (Ed.)
Full Text Available
Enhancing Adversarial Defense by k-Winners-Take-All

Xiao, Chang; Zhong, Peilin; Zheng, Changxi (January 2020, International Conference on Learning Representations)

We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.
more » « less
Full Text Available
Planning with General Objective Functions: Going Beyond Total Rewards

Wang, Ruosong; Zhong, Peilin; Du, Simon S; Salakhutdinov, Russ R; Yang, Lin F. (January 2020, Annual Conference on Neural Information Processing Systems)
null (Ed.)
Full Text Available
Log Diameter Rounds Algorithms for 2-Vertex and 2-Edge Connectivity

Andoni, Alexandr; Stein, Clifford; Zhong, Peilin (January 2019, International Colloquium on Automata, Languages, and Programming)

Full Text Available
Log Diameter Rounds Algorithms for 2-Vertex and 2-Edge Connectivity

https://doi.org/10.4230/LIPIcs.ICALP.2019.14

Andoni, Alexandr; Stein, Clifford; Zhong, Peilin (January 2019, 46th International Colloquium on Automata, Languages, and Programming)

Full Text Available

« Prev Next »

Search for: All records